1 Overview

1.1 Description of the project

This document contains the results of the statistical analysis for … project. According to the study Protocol, the following analysis should be performed:

  • The relationship between outcome and independent variables such as age, height and weight, ethnicity and BMI will be examined with regression models.

  • For each relationship of outcome and the independent variables, the linear and transformed (i.e., quadratic, power, log).

1.2 Predictors

The following variables have been collected during the visit and entered into the REDCap system:

  • Age
  • Sex
  • Ethnicity (Caucasian/Black/Other)
  • Weight (kg)
  • Height (cm)
  • BMI

Data sets used:

  • Outcome_dataset1.csv
  • Outcome_dataset2.csv
  • REDCap: dataset3.csv

1.3 Table 1 (Randomized)

The data set contains data for n = 400 participants marked as “randomized”.
The Table 1 contains the demographics characteristics for all randomized participants.

Table 1
Variable Female Male Overall
Age
      Count 153 247 400
      Mean (SD) 10.91 (3.30) 10.34 (3.55) 10.56 (3.46)
      Median (IQR) 11.19 (4.34) 10.48 (5.11) 10.69 (4.72)
      Q1, Q3 8.64, 12.98 7.78, 12.89 8.23, 12.95
      Min, Max 0.87, 19.91 0.20, 19.34 0.20, 19.91
      Missing 0 0 0
     
Height
      Count 153 247 400
      Mean (SD) 143.78 (19.97) 140.97 (21.34) 142.05 (20.84)
      Median (IQR) 143.95 (27.10) 142.95 (29.58) 143.00 (28.78)
      Q1, Q3 130.48, 157.58 127.59, 157.17 128.52, 157.30
      Min, Max 86.12, 191.91 87.27, 194.77 86.12, 194.77
      Missing 0 0 0
     
Weight
      Count 153 247 400
      Mean (SD) 38.50 (10.86) 36.04 (12.19) 36.98 (11.75)
      Median (IQR) 39.99 (14.80) 36.77 (16.14) 37.58 (15.60)
      Q1, Q3 30.95, 45.76 28.38, 44.52 29.28, 44.88
      Min, Max 10.65, 67.66 -5.48, 68.94 -5.48, 68.94
      Missing 0 0 0
     
Outcome
      Count 153 247 400
      Mean (SD) 0.52 (0.22) 0.55 (0.23) 0.54 (0.22)
      Median (IQR) 0.49 (0.31) 0.55 (0.31) 0.54 (0.31)
      Q1, Q3 0.37, 0.68 0.40, 0.71 0.38, 0.70
      Min, Max -0.05, 1.00 -0.09, 1.10 -0.09, 1.10
      Missing 0 0 0
     
Gender
      Count (%) 153 (38.25%) 247 (61.75%) 400
      (Col %)
      Female 153 (100.00%) 0 ( 0.00%) 153 (38.25%)
      Male 0 ( 0.00%) 247 (100.00%) 247 (61.75%)
      Missing 0 0 0
     
Ethnicity
      Count (%) 153 (38.25%) 247 (61.75%) 400
      (Row %)
      Caucasian 110 (38.19%) 178 (61.81%) 288 (100.00%)
      Other 43 (38.39%) 69 (61.61%) 112 (100.00%)
      Missing 0 0 0
     
BMI
      Count 153 247 400
      Mean (SD) 18.24 (1.98) 17.45 (2.86) 17.76 (2.58)
      Median (IQR) 18.22 (2.52) 17.69 (2.79) 17.89 (2.57)
      Q1, Q3 17.01, 19.53 16.44, 19.23 16.72, 19.29
      Min, Max 11.52, 23.84 -7.19, 23.59 -7.19, 23.84
      Missing 0 0 0
     
all randomized participants

2 Nature of Variables (description)

Dependent variables: All dependent variables are continuous.

Independent variables: can be continuous or categorical
• Age: continuous
• Weight: continuous
• Height: continuous
• BMI: continuous
• Sex: dichotomous categorical
• Ethnicity: categorical

2.1 Independent variables

2.1.1 Multicollinearity

The data set to be verified for the absence of Multicollinearity between the independent variables. Correlation analysis to be performed and well as the VIF (Variance Inflation Factor) to be explored.

2.1.2 Spearman Correlation:

2.1.2.1 All participants:

2.1.2.2 Males:

2.1.2.3 Females:

2.1.3 Distribution of IV (by sex)

Each histogram shows the distribution of the Independent variable by sex, the distribution of the whole dataset is shown in grey color at background. Visual examination aims to help to identify the possible outliers/extreme values and if any transformation can be applied.

2.1.3.1 Age

2.1.3.2 Height

2.1.3.3 Weight

2.2 Test for Normality

Shapiro-Wilk normality test was performed for all Independent variables to examine if the variable follows the Normal distribution.

Variable Statistic P-value
W Age 0.9965 0.52405
W Height stand 0.9946 0.17119
W Weight 0.9942 0.13656
W Outcome 0.9949 0.21135

2.2.1 Difference by Gender

Consider the association between outcome values by Gender.

2.2.1.1 by Gender

.y. group1 group2 p p.adj p.format p.signif method
Outcome Female Male 0.161 0.16 0.16 ns Wilcoxon

2.2.1.2 by Ethnicty

.y. group1 group2 p p.adj p.format p.signif method
Outcome Caucasian Other 0.966 0.97 0.97 ns Wilcoxon

2.2.2 Association of outcomes and IV (by sex)

Examining the association of outcome variables with independent variables stratified by sex.

2.2.3 Bivariate and multivariable association

Examining the association (bivariate and multivariable) between IOS variables and IV. The main aim is to examine significant bivariate associations and taking into consideration high correlation between IV, to select the best candidates for the final equation avoiding multicollinearity.

2.2.4 Outcome1

Outcome ~ lm(Age + Gender + Caucasian + Height_stand + Weight + bmi)

Dependent: Outcome Coefficient (univariable) Coefficient (multivariable)
1 Age [0.2,19.9] -0.053 (-0.057 to -0.050, p<0.001) -0.041 (-0.053 to -0.030, p<0.001)
5 Gender Female
6 Male 0.031 (-0.014 to 0.077, p=0.173) 0.005 (-0.021 to 0.031, p=0.696)
3 Ethnicity Caucasian
4 Other 0.002 (-0.047 to 0.052, p=0.921) -0.002 (-0.030 to 0.025, p=0.870)
7 Height [86.1,194.8] -0.009 (-0.009 to -0.008, p<0.001) -0.002 (-0.005 to 0.002, p=0.289)
8 Weight [-5.5,68.9] -0.014 (-0.016 to -0.013, p<0.001) -0.001 (-0.007 to 0.005, p=0.759)
2 BMI [-7.2,23.8] -0.019 (-0.027 to -0.010, p<0.001) 0.006 (-0.003 to 0.016, p=0.186)
Number in dataframe = 400, Number in model = 400, Missing = 0, Log-likelihood = 266.3, AIC = -516.6, R-squared = 0.69, Adjusted R-squared = 0.69
VIF
Age 10.27
Gender 1.02
Ethnicity 1.01
Height 33.12
Weight 34.23
BMI 3.92

2.3 Linearity diagnostics

Let’s start with Outcome1 model.

As visual exam of association of Outcome with Independent variables suggest linear association between Outcome and Age, Height, the following statistics to be reported:

Comparison of models
Formula
Outcome ~ Age
Outcome ~ Height
Outcome ~ Weight
Outcome ~ BMI
Rank Df.res AIC AICc BIC R.squared Adj.R.sq p.value Shapiro.W Shapiro.p
2 398 -516.5 -516.4 -504.5 0.682 0.681 0 0.995 0.180
2 398 -476.3 -476.2 -464.3 0.649 0.648 0 0.997 0.616
2 398 -401.1 -401.0 -389.1 0.576 0.575 0 0.996 0.379
2 398 -76.9 -76.8 -64.9 0.046 0.044 0 0.994 0.087

Verification of the linear association between the IOS variables and Independent variables.

2.3.1 Verification of the linear association

The residuals error (in red color) between observed values and the fitted regression line. Each vertical red segments represents the residual error between an observed Outcome values and the corresponding predicted (i.e. fitted) value.

2.3.1.1 Residuals error

2.3.1.2 Linearity

The red line is approximately horizontal at zero, indicating a little pattern in the residuals…

2.3.1.3 Normality of residuals

The QQ plot of residuals can be used to visually check the normality assumption. The normal probability plot of residuals should approximately follow a straight line. In our example, minimum deviation is observed along the reference line at the …., other plots could suggest that the assumption of normality of residuals is violated.

2.3.1.4 Scale-Location

Checking the homogeneity of variance of the residuals (homoscedasticity), we have to verify if the points are equally spread around the horizontal line - which is observed in Age (no transformation) and Height_stand (no transformation).

2.4 Multivariable regression

Based on the previous results, the following models to be compared for IOS variables of interest.

  • Model 1: Outcome ~ lm(Age + Caucasian + Height)
  • Model 2: Outcome ~ lm(Age + Caucasian + Weight)
  • Model 3: Outcome~ lm(Age + Height)
  • Model 4: Outcome ~ lm(Age + BMI)
  • Model 5: Outcome ~ lm(Height)
  • Model 6: Outcome ~ lm(Age)
Dependent: Outcome Coefficient (univariable) Coefficient (multivariable)
Model 3
Age [0.2,19.9] -0.053 (-0.057 to -0.050, p<0.001) -0.040 (-0.051 to -0.029, p<0.001)
Height [86.1,194.8] -0.009 (-0.009 to -0.008, p<0.001) -0.002 (-0.004 to -0.000, p=0.014)
Model 4
Age [0.2,19.9] -0.053 (-0.057 to -0.050, p<0.001) -0.055 (-0.058 to -0.051, p<0.001)
BMI [-7.2,23.8] -0.019 (-0.027 to -0.010, p<0.001) 0.005 (0.000 to 0.010, p=0.044)
Model 5
Height [86.1,194.8] -0.009 (-0.009 to -0.008, p<0.001) -0.009 (-0.009 to -0.008, p<0.001)
Model 6
Age [0.2,19.9] -0.053 (-0.057 to -0.050, p<0.001) -0.053 (-0.057 to -0.050, p<0.001)